Linguistic Technologies Applied Lexicography and Scientific Text Corpora

نویسنده

  • Larisa Beliaeva
چکیده

Nowadays applied lexicography is a special domain of applied linguistics and language engineering in the framework of problemoriented automated and automatic dictionaries and databases. Modern approach to dictionary creation assumes preliminary work with parallel or comparable text corpora to be considered as reference database for solving both research and practical lexicographic problems. Parallel text corpora are not always available. One of the options is to create a source lexicographical material as a text corpus with parallel presentation of initial texts, their machine translations and post-editing results. Analysis of comparable text corpus permits to reveal the set of terminological collocations (mostly noun phrases) on the translation level. The paper considers this process on the example of creating a dictionary on the Bologna process domain. The procedure permits to specify translations of lexical units in large text collections and to reveal the domain structure and its terminological system. This idea is shown on the examples of analysis for the collocations with component “higher education”, being the most frequent in the Bologna process text corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards an integrated representation of multiple layers of linguistic annotation in multilingual corpora

There has been an increasing interest in recent years in the enrichment of natural language corpora in terms of annotation with explicit linguistic information. This interest manifests itself most prominently in two areas of linguistics: corpus linguistics and computational linguistics. For corpus linguistics, the long standing practice has been to work on raw, i.e., unannotated text. While raw...

متن کامل

Parallel Corpora, Alignment Technologies and Further Prospects in Multilingual Resources and Technology Infrastructure

Multilingual technologies, which to a large extent are language independent, provide a powerful support for easier building of annotated linguistic resources for languages where such resources are scarce or missing. All these technologies require parallel corpora in order to achieve their ends. Parallel texts encode extremely valuable linguistic knowledge because the linguistic decisions made b...

متن کامل

A Cross-linguistic and Cross-cultural Study of Epistemic Modality Markers in Linguistics Research Articles

Epistemic modality devices are believed to be one of the prominent characteristics of research articles as the commonly used genre among the academic community members. Considering the importance of such devices in producing and comprehending scientific discourse, this study aimed to cross–culturally and cross-linguistically investigate epistemic modality markers as an important subcategory...

متن کامل

Preparation and Analysis of Linguistic Corpora

The corpus is a fundamental tool for any type of research on language. The availability of computers in the 1950’s immediately led to the creation of corpora in electronic form that could be searched automatically for a variety of language features and compute frequency, distributional characteristics, and other descriptive statistics. Corpora of literary works were compiled to enable stylistic...

متن کامل

Morphologically and Syntactically Annotated Corpora of Many Languages

Annotated corpora have become a standard resource for research in both linguistics and computational processing of natural languages. Lexicographers judge word usage and distribution by occurrences in corpora; part-of-speech tags may help them narrow their queries. Grammarians may use syntactically annotated corpora (treebanks) for queries such as “show me all examples where a verb governs two ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014